21. Assessing Summary
Assess: Summary
Assessing is the second step in the data wrangling process:
- Gather
- Assess
- Clean
You can assess data for:
- Quality: issues with content. Low quality data is also known as dirty data.
-
Tidiness: issues with structure that prevent easy analysis. Untidy data is also known as messy data. Tidy data requirements:
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table.
…using two types of assessment:
- Visual assessment: scrolling through the data in your preferred software application (Google Sheets, Excel, a text editor, etc.).
-
Programmatic assessment: using code to view specific portions and summaries of the data (pandas'
head
,tail
, andinfo
methods, for example).